A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data.
نویسنده
چکیده
Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high-dimensional multivariate traits. Finally, I illustrate the utility of the new approach by evaluating the strength of phylogenetic signal for head shape in a lineage of Plethodon salamanders.
منابع مشابه
A method for assessing phylogenetic least squares models for shape and other high-dimensional multivariate data.
Studies of evolutionary correlations commonly use phylogenetic regression (i.e., independent contrasts and phylogenetic generalized least squares) to assess trait covariation in a phylogenetic context. However, while this approach is appropriate for evaluating trends in one or a few traits, it is incapable of assessing patterns in highly multivariate data, as the large number of variables relat...
متن کاملDetecting taxonomic and phylogenetic signals in equid cheek teeth: towards new palaeontological and archaeological proxies
The Plio-Pleistocene evolution of Equus and the subsequent domestication of horses and donkeys remains poorly understood, due to the lack of phenotypic markers capable of tracing this evolutionary process in the palaeontological/archaeological record. Using images from 345 specimens, encompassing 15 extant taxa of equids, we quantified the occlusal enamel folding pattern in four mandibular chee...
متن کاملEstimating Algorithms for Prediction and Spread of a Factor as a Pandemic: A Case Study of Global COVID-19 Prevalence
Background: This paper presents open-source computer simulation programs developed for simulating, tracking, and estimating the COVID-19 outbreak. Methods: The programs consisted of two separate parts: one set of programs built in Simulink with a block diagram display, and another one coded in MATLAB as scripts. The mathematical model used in this package was the SIR, SEIR, and SEIRD models re...
متن کاملHigh dimensional data analysis using multivariate generalized spatial quantiles
High dimensional data routinely arises in image analysis, genetic experiments, network analysis, and various other research areas. Many such datasets do not correspond to well-studied probability distributions, and in several applications the data-cloud prominently displays non-symmetric and non-convex shape features. We propose using spatial quantiles and their generalizations, in particular, ...
متن کاملComparison and evaluation of the performance of data-driven models for estimating suspended sediment downstream of Doroodzan Dam
Dams control most of the sediment entering the reservoir by creating static environments. However, sediment leaving the dam depends on various factors such as dam management method, inlet sediment, water height in the reservoir, the shape of the reservoir, and discharge flow. In this research, the amount of suspended sediment of Doroodzan Dam based on a statistical period of 25 years has been i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 63 5 شماره
صفحات -
تاریخ انتشار 2014